Skip to content

Conversation

@Yashagarwal9798
Copy link
Contributor

Summary

This PR fixes the performance regression where @fastmath x^2 for Float32 was not being inlined to efficient LLVM code, unlike Float64.

Problem

As reported in #60639, @fastmath x^2 for Float32 was falling back to power_by_squaring instead of using the LLVM powi intrinsic. This resulted in:

  • Unnecessary function calls instead of inline multiplication
  • Potential type promotion to Float64
  • Suboptimal generated code compared to Float64

Before this fix, @code_llvm @fastmath Float32(1.5)^2 would show calls to power_by_squaring, while Float64 correctly used the llvm.powi intrinsic.

Solution

Added the missing pow_fast methods for Float32 and Float16:

  • pow_fast(::Float32, ::Int32) - uses llvm.powi.f32.i32 intrinsic directly
  • pow_fast(::Float32, ::Integer) - wrapper that converts to Int32 when safe, matching the Float64 pattern
  • pow_fast(::Float16, ::Integer) - converts to Float32, computes, and converts back

This mirrors the existing implementation for Float64 which already used llvm.powi.f64.i32.

Testing

Added a regression test that verifies @fastmath x^2 generates inline fmul instructions (not power_by_squaring calls) for Float16, Float32, and Float64.

Fixes #60639

@adienes
Copy link
Member

adienes commented Jan 12, 2026

@Yashagarwal9798 please be aware that all uses of AI must be disclosed

@oscardssmith oscardssmith added performance Must go faster maths Mathematical functions merge me PR is reviewed. Merge when all tests are passing labels Jan 12, 2026
@DilumAluthge
Copy link
Member

Build failure looks real?

error during bootstrap:
LoadError("sysimg.jl", 3, LoadError("Base.jl", 222, LoadError("fastmath.jl", 300, UndefVarError(:IEEEFloat, 0x0000000000005e69, Base.FastMath))))

@DilumAluthge DilumAluthge added failing CI is failing. Needs attention. No need to re-run CI. and removed merge me PR is reviewed. Merge when all tests are passing labels Jan 12, 2026
@DilumAluthge
Copy link
Member

@oscardssmith CI is all green now.

@oscardssmith oscardssmith merged commit f34d5f2 into JuliaLang:master Jan 14, 2026
8 checks passed
@oscardssmith
Copy link
Member

@Yashagarwal9798 thanks for the PR!

@Yashagarwal9798
Copy link
Contributor Author

Thanks! really glad I could help improve the project.

@eschnett
Copy link
Contributor

This is a candidate for backporting. Although this concerns only an optimization, generating inefficient code for x^2 is a disaster performance-wise.

@oscardssmith
Copy link
Member

Agreed. If we backport, we need to be careful not to back-port the Float16 version of this to 1.12 (LLVM only recently doesn't produce garbage on x86 with this intrinsic). 1.13 should receive the back-port unmodified though. I'll put up the 1.12 PR.

@oscardssmith oscardssmith added the backport 1.13 Change should be backported to release-1.13 label Jan 14, 2026
@oscardssmith oscardssmith changed the title Fix @fastmath x^2 inlining regression for Float32 and Float16 Fix @fastmath x^2 inlining regression for Float32 and Float16 Jan 14, 2026
KristofferC pushed a commit that referenced this pull request Jan 26, 2026
## Summary

This PR fixes the performance regression where `@fastmath x^2` for
`Float32` was not being inlined to efficient LLVM code, unlike
`Float64`.

## Problem

As reported in #60639, `@fastmath x^2` for `Float32` was falling back to
`power_by_squaring` instead of using the LLVM `powi` intrinsic. This
resulted in:
- Unnecessary function calls instead of inline multiplication
- Potential type promotion to `Float64`
- Suboptimal generated code compared to `Float64`

Before this fix, `@code_llvm @fastmath Float32(1.5)^2` would show calls
to `power_by_squaring`, while `Float64` correctly used the `llvm.powi`
intrinsic.

## Solution

Added the missing `pow_fast` methods for `Float32` and `Float16`:

- `pow_fast(::Float32, ::Int32)` - uses `llvm.powi.f32.i32` intrinsic
directly
- `pow_fast(::Float32, ::Integer)` - wrapper that converts to `Int32`
when safe, matching the `Float64` pattern
- `pow_fast(::Float16, ::Integer)` - converts to `Float32`, computes,
and converts back

This mirrors the existing implementation for `Float64` which already
used `llvm.powi.f64.i32`.

## Testing

Added a regression test that verifies `@fastmath x^2` generates inline
`fmul` instructions (not `power_by_squaring` calls) for `Float16`,
`Float32`, and `Float64`.

Fixes #60639

---------

Co-authored-by: Oscar Smith <[email protected]>
(cherry picked from commit f34d5f2)
@KristofferC KristofferC mentioned this pull request Jan 26, 2026
43 tasks
@KristofferC KristofferC removed the backport 1.13 Change should be backported to release-1.13 label Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

failing CI is failing. Needs attention. No need to re-run CI. maths Mathematical functions performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

power_by_squaring is not inlined for x^2

8 participants